Distr ibuted System

نویسنده

  • David Bruce Johnson
چکیده

Fault tolerance can allow processes executing in a computer system to survive failures within the system This thesis addresses the theory and practice of transparent fault tolerance methods using message logging and checkpointing in distributed systems A general model for reasoning about the behavior and correctness of these methods is developed and the design implementation and performance of two new low overhead methods based on this model are presented No specialized hardware is required with these new methods The model is independent of the protocols used in the system Each process state is represented by a dependency vector and each system state is represented by a dependency matrix showing a collection of process states The set of system states that have occurred during any single execution of a system forms a lattice with the sets of consistent and recoverable system states as sublattices There is thus always a unique maximum recoverable system state The rst method presented uses a new pessimistic message logging protocol called sender based message logging Each message is logged in the local volatile memory of the machine from which it was sent and the order in which the message was received is returned to the sender as a receive sequence number Message logging overlaps execution of the receiver until the receiver attempts to send a new message Implemented in the V System the maximum measured failure free overhead on dis tributed application programs was under percent and average overhead measured percent or less depending on problem size and communication intensity Optimistic message logging can outperform pessimistic logging since message log ging occurs asynchronously A new optimistic message logging system is presented that guarantees to nd the maximum possible recoverable system state which is not ensured by previous optimistic methods All logged messages and checkpoints are utilized and thus some messages received by a process before it was checkpointed may not need to be logged Although failure recovery using optimistic message log ging is more di cult failure free application overhead using this method ranged from only a maximum of under percent to much less than percent

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comportamiento Autónomo del Holón Recurso basado en la Agenda de Producción

The manufactur ing systems a r e unpr ed ictab le, distr ibuted and highly dynamic, which demands the cont rol architecture fl exibility, autonomous decision­making capability and fast adaptation in the presence of disturbances that may be in the system. The Holonic and Multi­Agent par adigms have shown to be suitable for the design and modeling of control architectures and the...

متن کامل

A New Algorithm to Implement Causal Ordering

This paper presents a new algorithm to implement causal ordering. Causal ordering was first proposed in the ISIS system developed at Cornell University. The interest of causal ordering in a distr ibuted system is that it is cheaper to realize than total ordering. The implementation of causal ordering proposed in this paper uses logical clocks of Mat te rn-Fidge (which define a partial order bet...

متن کامل

Modeling of Hierarchical Distributed Systems with Fault-Tolerance

Absfracf-This paper addresses some fault-tolerant issues pertaining to hierarchically distr ibuted systems. Since each o f the levels in a hierarchical system could have various characteristics, different faulttolerance schemes could he appropriate at different levels. I n this paper, we use stochastic Pet r i nets (SPN's) to investigate various faulttolerant schemes in this context. The basic ...

متن کامل

Profiling Communication in Distributed Genetic Algorithms

To what extent is distr ibution beneficial to the search quali ty and computational resources used by a genetic algori thm execution? Most distr ibuted genetic algorithms rely on communicating genetic informat ion, in the form of individual solutions, between concurrently evolving populations. Another way of effectively using the additional information generated by the parallel executions is th...

متن کامل

An Algorithm for Understanding of Color Vision

depth and shape in low level visual processing. Baaed on the copatational theory and Para1 lel Distr ibuted Processing theory, a parallel algorithm for realizing subjective color vision(SCV) is presented in this paper. The paper contains following sections: the computational theory for low level color vision is mentioned at first section; Then PDP algorithm of color vision is described. Finally...

متن کامل

Automatic Data Decomposit ion for Message-Passing Machines

1 I n t r o d u c t i o n Distributed-memory message-passing computers are becoming more common these days because they offer significant advantages over shared-memory machines in terms of cost and scalability. However, distr ibuted-memory machines are more difficult to program than shared-memory machines because programmers of distributed-memory machines have to manage low-level tasks like dis...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1989